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Abstract. The modeling flexibility provided by hypergraphs has drawn a lot of interest from 
the combinatorial scientific community, leading to novel models and algorithms, their applications, 
and development of associated tools. Hypergraphs are now a standard tool in combinatorial scien- 
tific computing. The modeling flexibility of hypergraphs however, comes at a cost: algorithms on 
hypergraphs are inherently more complicated than those on graphs, which sometimes translate to 
nontrivial increases in processing times. Neither the modeling flexibility of hypergraphs, nor the 
runtime efficiency of graph algorithms can be overlooked. Therefore, the new research thrust should 
be how to cleverly trade-off between the two. This work addresses one method for this trade-off by 
solving the hypergraph partitioning problem by finding vertex separators on graphs. Specifically, 
we investigate how to solve the hypergraph partitioning problem by seeking a vertex separator on 
its net intersection graph (NIG), where each net of the hypergraph is represented by a vertex, 
and two vertices share an edge if their nets have a common vertex. We propose a vertex-weighting 
scheme to attain good node-balanced hypergraphs, since NIG model cannot preserve node balancing 
information. Vertex-removal and vertex-splitting techniques arc described to optimize cutnet and 
connectivity metrics, respectively, under the recursive bipartitioning paradigm. We also developed an 
implementation for our GPVS-based HP formulations by adopting and modifying a state-of-the-art 
GPVS tool onmetis. Experiments conducted on a large collection of sparse matrices confirmed the 
validity of our proposed techniques. 

Key words, hypergraph partitioning; combinatorial scientific computing; graph partitioning by 
vertex separator; sparse matrices. 
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1. Introduction. A hypergraph is a generalization of a graph, since it replaces 
edges that connect only two vertices, with hyperedges (nets) that can connect multiple 
vertices. This generalization provides a critical modeling flexibility that allows accu- 
rate formulation of many important problems in combinatorial scientific computing. 
Our initial motivations for hypergraph models were accurate modeling of the nonzero 
structure of unsymmetric and rectangular sparse matrices to minimize communication 
volume for iterative solvers El CU H2 H31 HI ISH HS1 HH1 ESI and permutation 
to block-angular form for coarse-grain parallelism [3J . The real impact of these works 
turned out to be the introduction of hypergraph models to the combinatorial scientific 
computing community. Since then, the modeling power of hypergraphs appealed to 
many researchers and was applied to a wide variety of parallel and distributed com- 
puting applications such as data aggregation [15], image-space parallel direct volume 
rendering [7 , parallel mixed integer linear programming |53j . data declustering for 
multi-disk databases [351 E2] > scheduling file-sharing tasks in heterogeneous master- 
slave computing environments [33J [3H [37] , and work-stealing scheduling J5U] ■ Hy- 
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pergraphs were also applied to applications outside the parallel computing domain 
such as road network clustering for efficient query processing [THl [H3 HI] , pattern- 
based data clustering [43] . reducing software development and maintenance costs [J], 
processing spatial join operations |51j . and improving locality in memory or cache 
performance Q3 1521 161] . Hypergraphs and hypergraph partitioning are now standard 
tools of combinatorial scientific computing. 

Increasing popularity of hypergraphs has been going hand in hand with the de- 
velopment of effective hypergraph partitioning (HP) tools: wide applicability of hy- 
pergraphs motivated development of fast HP tools, and availability of effective HP 
tools motivated further applications. This virtuous cycle produced sequential HP 
tools such as hMeTiS [35], PaToH JO] and Mondriaan [59 , and parallel HP tools 
such as Parkway [M] and Zoltan [52], all of which adopt the multilevel framework 
successfully. While these tools provide good performances both in terms of solution 
quality and processing times, they are hindered by the inherent complexity of deal- 
ing with hypergraphs. Algorithms on hypergraphs are more difficult both in terms 
of computational complexity and runtime performance, since operations on nets are 
performed on sets of vertices as opposed to pairs of vertices as in graphs. The wide 
interest over the last decade has proven the modeling flexibility of hypergraphs to be 
essential, but the runtime efficiency of graph algorithms cannot be overlooked, either. 
Therefore, we believe that the new research thrust should be how to cleverly trade-off 
between the modeling flexibility of hypergraphs and the practicality of graphs. 

How can we solve problems that are most accurately modeled with hypergraphs 
using graph algorithms without sacrificing too much from what is really important 
for the application? This question has been asked before, and the motivation was 
either theoretical [25] or practical [TU [55] when the absence of HP tools behest these 
attempts. This earlier body of work investigated the relation between HP and graph 
partitioning by edge separator (GPES), and achieved little success. Today, we are 
facing a more difficult task, as effectiveness of available HP tools sets high standards 
for novel approaches. On the other hand, we can draw upon the progress on related 
problems, in particular the advances in tools for graph partitioning by vertex separator 
(GPVS), which is the main theme of this work. 

We investigate solving the HP problem by finding vertex separator on the net 
intersection graph (NIG) of the hypergraph. In the NIGof a hypergraph, each net 
is represented by a vertex and and each vertex of the hypergraph is replaced with a 
clique of the nets connecting that vertex. A vertex separator on this graph defines 
a net separator for the hypergraph. This model has been initially studied for circuit 
partitioning [2]. While faster algorithms can be designed to find vertex separators 
on graphs, the NIG model has the drawback of attaining balanced partitions. Once 
vertices of the hypergraphs are replaced with cliques, it will be impossible to preserve 
the vertex weight information accurately. Therefore, we can view the NIG model as 
a way to trade off computational efficiency with exact modeling power. 

What motivates us to investigate NIGs to solve HP problems arising in scientific 
computing applications is that in many applications, definition of balance cannot 
be very precise [3J HU US] or there are additional constraints that cannot be easily 
incorporated into partitioning algorithms and tools [47] ; or partitioning is used as part 
of a divide-and-conquer algorithm |46j . For instance, hypergraph models can be used 
to permute a linear program (LP) constraint matrix to a block angular form for parallel 
solution with decomposition methods. Load balance can be achieved by balancing 
subproblems during partitioning. However, it is not possible to accurately predict 
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solution time of an LP, and equal sized subproblems only increase the likelihood of 
computational balance. Hypergraph models have recently been used to find null-space 
bases that have a sparse inverse |46j . This application requires finding a column-space 
basis B as a submatrix of a sparse matrix A, so that B~ l is sparse. Choosing B 
to have a block angular form limits the fill in B^ 1 , but merely a block angular 
form for B will not be sufficient, since B has to be nonsingular to be a column- 
space basis for A . Enforcing numerical or even structural nonsingularity of subblocks 
during partitioning is a nontrivial task, if possible, and thus partitioning is used as 
part of a divide-and-conquer paradigm, where the partitioning phase is followed by a 
correction phase, if subblocks are non-singular. Both of these cases present examples 
of applications, where hypergraphs provide effective models, but balance among parts 
is only weakly defined. As we will show in the experiments, the NIG model can 
effectively be employed for these applications to achieve high quality solutions in a 
shorter time. We show that it is easy to enforce a balance criteria on the internal 
nets of hypergraph partitioning by enforcing vertex balancing during the partitioning 
of the NIG. However, the NIG model cannot completely preserve the node balancing 
information in the hypergraph. We propose a weighting scheme for the vertices of NIG, 
which is quite effective in attaining fairly node balanced partitions of the hypergraph. 
The proposed vertex balancing scheme for NIG partitioning can be easily enhanced to 
improve the balancing quality of the hypergraph partitions in a simple post processing 
phase. 

The recursive bipartitioning (RB) paradigm is widely used for multiway HP and 
known to produce good solution qualities [THUSS]- At each RB step, cutnet removal 
and cutnet splitting techniques [9j are adopted to optimize the cutsize according 
to the cutnet and connectivity metrics, respectively, which are the most commonly 
used cutsize metrics in scientific and parallel computing [3] as well as VLSI layout 
design [21 El]. In this paper, we propose separator- vertex removal and separator- vertex 
splitting techniques for RB-based partitioning of the NIG, which exactly correspond 
to the cutnet removal and cutnet splitting techniques, respectively. We also propose 
an implementation for our GPVS-based HP formulations by adopting and modifying 
a state-of-the-art GPVS tool used in fill-reducing sparse matrix ordering. 

2. Preliminaries. In this section, we will provide the basic definitions and tech- 
niques that will be adopted in the remainder of this paper. 

2.1. Graph Partitioning. An undirected graph Q — (V,£) is defined as a set 

V of vertices and a set £ of edges. Every edge £ £ connects a pair of distinct 
vertices Uj and vj . We use the notation Adj(vi) to denote the set of vertices adjacent 
to vertex Vi . We extend this operator to include the adjacency set of a vertex subset 

V C V, i.e., Adj(V) = {vj £ V-V : vj £ Adj(vi) for some £ V'}. Two disjoint 
vertex subsets Vk and Ve are said to be adjacent if Adj(Vk) (1 V< ^ (equivalently 
Adj(Vg) n Vfc ^ 0) and non-adjacent otherwise. The degree d(vi) of a vertex Vi is 
equal to the number of edges incident to vi , i.e., d(vi) = \ Adj(vi) \ . A weight w(i>i) > 
is associated with each vertex tij . 

An edge subset £$ is a if -way edge separator if its removal disconnects the 
graph into at least K connected components. That is, Hes(G) = {Vi, V2, • • • , Vk} is 
a K-way vertex partition of Q by edge separator £sC£ if each part Vk is non-empty; 
parts are pairwise disjoint; and the union of parts gives V . Edges between the vertices 
of different parts belong to £3 , and are called cut (external) edges and all other edges 
are called uncut (internal) edges. 

A vertex subset Vs is a K-way vertex separator if the subgraph induced by 
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the vertices in V — Vs has at least K connected components. That is, Hvs(G) = 
{Vi, V2, • • • , Vk', Vs} is a K-way vertex partition of Q by vertex separator Vs C V if 
each part Vk is non-empty; all parts and the separator are pairwise disjoint; parts 
are pairwise non-adjacent; and the union of parts and the separator gives V. The 
non-adjacency of the parts implies that Adj(Vk) (=Vs for each Vfe. A vertex ViGVk 
is said to be a boundary vertex of part Vk if it is adjacent to any vertex in Vs ■ A 
vertex separator is said to be narrow if no subset of it forms a separator, and wide 
otherwise. 

The objective of graph partitioning is finding a separator of smallest size subject 
to a given balance criteria on the weights of the K parts. The weight W(Vk) of a 
part Vk is defined as the sum of the weights of the vertices in Vk , i.e., 

W(Vk) = w ( y ^ 
f,ev fc 

and the balance criteria is defined as 

max W(Vk) < (1 + e)WavQ , where (2.2) 

l<k<K v a 

w _ Ek=iW(Vk) 

VV m g - K 

Here, W avg is the weight each part must have in the case of perfect balance, and e 
is the maximum imbalance ratio allowed. We proceed with formal definitions for the 
GPES and GPVS problems, both of which are known to be NP-hard [6]. 

Definition 2.1 (Problem GPES). Given a graph Q — (V, £), an integer K , 
and a maximum allowable imbalance ratio e . The GPES problem is finding a K-way 
vertex partition Hes(G) — {Vi, V2, • ■ • , Vk} of Q by edge separator £$ that satisfies 



the balance criterion given in (2.2) while minimizing the cutsize, which is defined as 



cutsize(U E s) = c ( e »j)j (2-3) 

where c{eij) > is the cost of edge — (vi,Vj) . 

Definition 2.2 (Problem GPVS). Given a graph Q — (V,£), an integer K , and 
a maximum allowable imbalance ratio e. The GPVS problem is finding a K-way vertex 
partition TlysiG) = {Vi, V2, . . . , Vk ; Vs} of Q by vertex separator Vs that satisfies the 



balance criterion given in (2.2) while minimizing the cutsize, which is defined as one 
of 

a) cutsize(Tlys) = /J c(uj) (2-4) 

b) cutsize(n vs ) = ^ c ( v i)( X ( v i) ~ !) ( 2 -5) 

Vi£Vs 

where c(yi) > is the cost of vertex Vi . 

In the general GPVS definition given above, both a weight and a cost are associ- 
ated with each vertex. The weights are used in computing loads of parts for balancing, 
whereas the costs are utilized in computing the cutsize. In the standard GPVS defi- 
nitions in the literature, the weights and costs of the vertices are taken as identical. 
The reason for our general GPVS definition will become clear in Section [3] 
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In the cutsize definition given in (2.4), each separator vertex incurs its cost to the 
cutsize, whereas in (2.51, the connectivity of a vertex is considered while incurring 
its cost to the cutsize. The connectivity A(u,) of a vertex Vi denotes the number of 
parts connected by Vi , where a vertex that is adjacent to any vertex in a part is said 
to connect that part. 

The techniques for solving GPES and GPVS problems are closely related. An 
indirect approach to solve the GPVS problem is to first find an edge separator through 
GPES, and then translate it to any vertex separator. After finding an edge separator, 
this approach takes vertices adjacent to separator edges as a wide separator to be 
refined to a narrow separator, with the assumption that a small edge separator is 
likely to yield a small vertex separator. The wide-to-narrow refinement problem 49 
is described as a minimum vertex cover problem on the bipartite graph induced by 
the cut edges. A minimum vertex cover can be taken as a narrow separator for the 
whole graph, because each cut edge will be adjacent to any vertex in the vertex cover. 



2.2. Hypergraph Partitioning. A hypergraph H = (U,Af) is defined as a set 
IA of nodes (vertices) and a set Af of nets among those vertices. We refer to the vertices 
of H as nodes, to avoid the confusion between graphs and hypergraphs. Every net 
Uj G Af connects a subset of nodes, i.e., riiQU . The nodes connected by a net rij are 
called pins of and denoted as Pins(rii) . We extend this operator to include the 
pin list of a net subset Af' cAf, i.e., Pins(Af') — \J n . eA f, Pins(ni) . The size s(rij) 
of a net is equal to the number of its pins, i.e., s(ni) — \Pins(ni)\. The set of 
nets that connect a node Uj is denoted as Nets(uj). We also extend this operator 
to include the net list of a node subset U 1 cW, i.e., Nets(U') = \J u . e u' Nets(iij) . 
The degree d(uj) of a node Uj is equal to the number of nets that connect Uj , 
i.e., d(uj) — \Nets(uj)\ . The total number p of pins denote the size of T-L where 
p = J2n efJ s ( n i) = En eii d( u j) ■ A graph is a special hypergraph such that each net 
has exactly two pins. A weight w(uj) is associated with each node Uj , whereas a 
cost c(ni) is associated with each net n.; . A weight w(ni) can also be associated with 
each net as we will discuss later in this section. 

A net subset Afs is a if -way net separator if its removal disconnects the hyper- 
graph into at least K connected components. That is, Uu(ri) — {Ui,U2, ■ ■ ■ Mk} is a 
if -way node partition of T-L by net separator A/jCAC if each part lAk is non-empty; 
parts are pairwise disjoint; and the union of parts gives U. In a partition 11^ (%), a 
net that connects any node in a part is said to connect that part. The connectivity 
X(rii) of a net Uj denotes the number of parts connected by rii . Nets connecting mul- 
tiple parts belong to Afs, and are called cut (external) (i.e., A(n^) > 1), and uncut 
(internal) otherwise (i.e., A(n^) = 1). The set of internal nets of a part Uk is denoted 
as Aft, for k= 1,...,K. So, although 11^ (T~L) is defined as a if -way partition on 
the node set of %, it can also be considered as inducing a (if + l)-way partition 
ITvCH) = {M, • ■ • Mk\Ns} on the net set. 

As in the GPES and GPVS problems, the objective of HP problem is finding a 
net separator of smallest size subject to a given balance criteria on the weights of the 
K parts. The weight W(lAk) of a part Uk is defined either as the sum of the weights 
of nodes in Uk , i.e., 



(2.6) 



() 



or as the sum of weights of internal nets of part U k , i.e., 

W{U k )= w ( n d ( 2J ) 

The former and latter part weight computation schemes together with the load bal- 



ancing criteria given in (2.2) will be referred to here as node and net balancing, 
respectively. We proceed with formal definition for the HP problem, which is also 
known to be NP-hard [41] , 

Definition 2.3 (Problem HP). Given a hypergraph 'H = (hl,Af) , an integer K , 
and a maximum allowable imbalance ratio e . The HP problem is finding a K-way node 
partition Hu('H) = {Ui,U2, ■ ■ ■ Mk} of H. that satisfies the balance criterion given in 



(2.2) while minimizing the cutsize, which is defined as one of 



a) cutsize(Uu) = c(ni) (2-8) 

b) cutsize{n u ) = ^ c{ni){X{rn) - 1) (2.9) 



The cutsize metrics given in ( |2.8[ ) and (2.9) are referred to as the cut-net and connec- 
tivity metrics, respectively, [5l 1131 BTj . 

3. Formulating the HP Problem as a GPVS Problem. In this section, 
we first review the previous work on alternative models for solving the HP problem. 
Then, we describe our novel and accurate GPVS-based formulations and present the 
relation between HP and GPVS problems from a matrix theoretical view. Finally, we 
present our implementation based on adapting a state-of-the-art GPVS tool. 

3.1. Alternative Models for Solving the HP Problem. As indicated in the 
survey by Alpert and Kahng [2] , hypergraphs are commonly used to represent circuit 
nctlist connections in solving the circuit partitioning and placement problems in VLSI 
layout design. The circuit partitioning problem is to divide a system specification into 
clusters to minimize inter-cluster connections. Other circuit representation models 
were also proposed and used in the VLSI literature including dual hypergraph, clique- 
net graph (CNG) and net-intersection graph (NIG) [2]. Hypergraphs represent circuits 
in a natural way so that the circuit partitioning problem is directly described as an HP 
problem. Thus, these alternative models can be considered as alternative approaches 
for solving the HP problem. 

The dual of a hypergraph % = (U,Af) is defined as a hypergraph %' , where 
the nodes and nets of H become, respectively, the nets and nodes of H' . That 
is, H' — (W,Af') with Nets^'j) — Pins(ni) for each u[ i G U' and g M , and 
Pins(n'j) — Nets(uj) for each n'^GAf' and UjEU. 

In the CNG model, the vertex set of the target graph is equal to the node set of 
the given hypergraph. Each net of the given hypergraph is represented by a clique 
of vertices corresponding to its pins. The multiple edges between two vertices are 
contracted into a single edge, the cost of which is set equal to the sum of the cost 
of the edges it represents. If an edge is in the cut set of a GPES then all nets 
represented by this edge are in the cut set of HP. Ideally, no matter how nodes of a 
net are partitioned, the contribution of a cut net to the cutsize should always be one 
in a bipartition when unit net costs are assumed. However, the deficiency of the CNG 
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(a) (b) 
Fig. 3.1. (a) A sample hypergraph H and (b) the corresponding NIG representation Q . 



representation is that it is impossible to achieve such a perfect edge-cost assignment 
of the edges as proved by Ihler et al. [25] . 

In the NIG representation Q = (V,£) of a given hypergraph % = (U,Af), each 
vertex Vi of Q corresponds to net rii of %. Two vertices Vi,Vj€.V of Q are adjacent 
if and only if respective nets n^, rij £j\f of % share at least one pin, i.e., ey €£ if and 
only if Pins(rii) D Pins(rij) ^ 0. So, 

Adj{vi) = {vj : rij e N 3 Pins(n,i) H Pins(nj) ^ 0}. (3.1) 



Note that for a given hypergraph % 
unique reverse construction [2 . Figures 3.1(a) and 



NIG Q is well-defined, however there is no 
^b) , respectively display a 
sample hypergraph % and the corresponding NIG representation Q . In the figure, 
the sample hypergraph T~L contains 18 nodes and 15 nets, whereas the corresponding 
NIG Q contains 15 vertices and 30 edges. 

Both dual hypergraph and NIG models view the HP problem in terms of parti- 
tioning nets instead of nodes. Kahng [30] and Cong, Hagen, and Kahng [16] exploited 
this perspective of the NIG model to formulate the hypergraph bipartitioning problem 
as a two-stage process. In the first stage, nets of T~L are bipartitioned through 2-way 
GPES of its NIG Q . The resulting net bipartition induces a partial node bipartition 
on H . Because, the nodes (pins) that are connected only by the nets on one side of 
the bipartition can be unambiguously assigned to that side. However, other nodes 
may be connected by the nets on both sides of the bipartition. Thus, the second 
stage involves finding the best completion of the partial node bipartition; i.e., a part 
assignment for the shared nodes such that the cutsize is minimized. This problem is 
known as the module (node) contention problem in the VLSI community. Kahng [3U] 
used a winner-loser heuristic [27], whereas Cong et al. [TB] used a matching-based 
(IG-match) algorithm for solving the 2-way module contention problem optimally. 
Cong, Labio, and Shivakumar [17j extended this approach to K-w&y HP through us- 
ing the dual hypergraph model. In the first stage, a K-w&y net partition is obtained 
through partitioning the dual hypergraph. For the second stage, they formulated the 
iC-way module contention problem as a min-cost max-flow problem through defining 
binding factors between nodes and nets, and a preference function between parts and 
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nodes. 

Here, we reveal the fact that the module contention problem encountered in the 
second stage of the NIG-based hypergraph bipartitioning approaches [El [30] is similar 
to the wide-to-narrow separator refinement problem encountered in the second stage of 
the indirect GPVS approaches widely used in nested-dissection based low-fill orderings 
for sparse matrix factorization. The module contention and separator refinement al- 
gorithms effectively work on the bipartite graph induced by the cut edges of a two-way 
GPES of the NIG representation of hypergraphs and the standard graph representa- 
tion of sparse matrices, respectively. The winner-loser assignment heuristic [271 130) 
used by Kahng |30) is very similar to the minimum-recovery heuristic proposed by 
Leiserson and Lewis [ID] for separator refinement. Similarly, the IG- match algorithm 
proposed by Cong et al. [16] is similar to the maximum-matching based minimum 
vertex-cover algorithm [39[ [48] used by Pothen, Simon, and Liou [49] for separator re- 
finement. While not explicitly stated in the literature, these net-bipartitioning-based 
HP algorithms using the NIG model can be viewed as trying to solve the HP problem 
through an indirect GPVS of the NIG representation. 

More recently, Trifunovic and Knottenbelt |54j proposed a coloring-based graph 
model for partitioning special type of hypergraphs which arise in fine-grain (nonzero- 
based) partitioning of sparse matrices [131 lllj for parallel matrix vector multiply. In 
such hypergraphs, each vertex is connected by exactly two nets and their dual hy- 
pergraphs are bipartite graphs. A A -way edge coloring on this bipartite graph is 
decoded as a A -way partitioning of the nodes (nonzeros) of the original hypergraph. 
The coloring objective, which is defined in terms of the number of distinct colors 
incident to the vertices, correctly models the total interprocessor communication vol- 
ume. Since connectivity cutsize metric (2.9) also correctly models total interprocessor 
communication volume, the coloring objective exactly models the connectivity cut- 
size metric (2.9 1. Although this model is proposed for special type of hypergraphs in 
which each node is connected by exactly two nets, the model easily extends to more 
general hypergraphs where nodes are connected by arbitrary number of nets. 



3.2. An Accurate Formulation of HP as GPVS on NIG Model. We pro- 
pose a net-partitioning based if -way HP algorithm that avoids the module contention 
problem by describing the HP problem as a GPVS problem through the NIG model. 
The following theorem lays down the basis for our GPVS-based HP formulation. Let 
G = (V,£) denote the NIG of a given hypergraph % = (U,Af) ■ The cost of each net 
rii of H is assigned as the cost of the respective vertex Uj of Q, i.e., c(uj) = c(rii). 
For brevity of the presentation we assume unit net costs here, but all proposed models 
and methods generalize to hypergraphs with non-unit net costs. 

Theorem 1. A K -way vertex partition Hvs(G) — {Vi, ... ,V^;Vs} of Q by a 
narrow vertex separator Vs induces a K -way contention-free net partition H^lTi) = 
{Afi = Vi,Af 2 = V 2 ,--- ,Mk = Vk;Ns = Vs} of H by a net separator Af s . 

Proof. By definition of GPVS, we have Adj(V k ) n V e = for 1 < k < i < K. 
This implies that Pins(JVk) H Pins(Ni) = for 1 < k < £ < K . Because, if any two 
nets n.j € and rij £ Ng shared at least one pin, then there would be an edge 
Cij between vertices v t € and Vj G V( of Q , which would correspond to an edge 
between parts Vfc and Vi of Hvs(G) contradicting the definition of GPVS. Therefore, 
any two nets belonging to two different net parts don't share any pin, thus ensuring 
the contention-free property of the net partition n_A/('H). □ 

Corollary 1. A K-way contention-free net partition ofH by a net separator A/g 
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ILv(H) = {M = Vi, . • • ,M K = Vk ;M s = V s } 

induces a K -way partial node partition 

ILlt(H) = {U' l =Pins{Mi),...,U' K = Pins{M K )} 



(3.2) 



(3.3) 



Let Up denote the set of remaining nodes. Note that Uf also corresponds to the 
set of nodes that are only connected by the nets of the separator Ms ■ That is, 



K 



U-{JU' k = {ui G U : Nets(u t ) C M s = V S } 



(3.4) 



k=l 



The nodes in Up will be referred to here as free nodes. 

Figure 3.2(a)| shows a 3-way GPVS Hvs(G) of the sample NIG Q given in Fig- 



ure 3.1(b) Figure 3.2(b) shows the 3-way partial and complete node partition T\ I U ('H) 
of the sample %, which is induced by Tlvs(G)- Partial node partition is displayed 
with nodes drawn with solid lines, and complete node partition is achieved by adding 
2 free nodes (drawn with dashed lines). The sample % given in Figure 3.1(a) contains 



only 2 free nodes, which are un and Uig. Comparison of Figures 3.2(a) and 3.2(b) 



illustrates that the separator vertices v±,vs and vi 5 of Hvs(G) induce the cut nets 
ni,ng, and nis of II^('H), respectively. 

For any arbitrary assignment of free nodes, we can construct a complete node 
partition in the following form, 



Hu{H) = {Ui D U[Mi 2 K ■ ■ ■ Mk 2 U' K } (3.5) 

Note that any K-w&y node partition of % inducing the (if + 1) -way net partition 
II/vCW) nas to be in the form above. 

Theorem 2. Given a K -way vertex partition Tlys(G) of Q by a narrow vertex 
separator Vs , any node partition n^('H) of % as constructed according to (3.5) in- 
duces the (K+l) -way net partition Iij^{T-L) = {Mi = V\, . . . ,Mk = Vk\Ms = Vs} such 
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that the connectivity of each cut net in Afs is greater than or equal to the connectivity 
of the corresponding separator vertex in Vs . 



Proof. Let Iiu{T~i) be a node partition of H as constructed according to (3.5). 
Consider the net Ui G Afk corresponding to a vertex Vi G Vk of Tlvs{G)- Since 
Pins(rii) C lAk , Ui will be an internal net of node part Uk for node partition T\u{T~i) . 
Consider any net n s G Afs corresponding to separator vertex v s € Vs of Hvs(G)- 
Since Vs is narrow, there exists at least two vertices Vj t G Vk t and Vj 2 G Vk 2 adjacent 
to v s such that k\ ^ k 2 ■ Then, corresponding nets nj 1 and n j2 are internal nets of 
Uk 1 and Uk 2 , respectively by Theorem]!] Since the vertices Vj 1 and Vj 2 are adjacent to 
v s in Q , there exists two nodes and uh 2 such that G Pins{nj x )r\Pins{ni) and 
u/ l2 G Pins(nj 2 ) H Pinsijii) by the NIG definition. By the construction of II^('H), 
since Pinslrij^ C W fcl and Pins(nj 2 ) C £4 2 , we have G and it/, a G £4 2 , 
which in turn implies that n s is a cut net for the node partition 11^ (fH) . Therefore 
Ikt(U) induces the net partition ILv(K) = {M=Vi, . • . ,Af K = V K ;Afs = Vs}- 

Consider the connectivity of the net n s G Afs corresponding to a separator vertex 
v s G Vs of Hvs(G)- Since Pins(Afk) Q Uk , for 1 < k < K, if vertex-part 14 
of contains a vertex Uj G Vfc that is adjacent to v s then node-part Uk of 

Hii(HP) contains a node Uh such that Uh G Pins(n s )P\Pins(nj) . Thus, if a separator 
vertex v s G V5 connects vertex part Vk , the net n s G Afs also connects node part 
Uk ■ The connectivity of net n s may become strictly greater than that of vertex v s 
if n s connects a free node Uf assigned to a part Ut that is not connected by n s in 
partial node partition n^('H) , i.e., U' e n Pins(n s ) = 0. □ 

Corollary 2. Given a K-way vertex partition Hvs{S) of Q by a narrow vertex 
separator Vs , the separator size of Tlys{G) is equal to the cutsize of node partition 
YLuifH) induced by Tivs (G) according to cutnet metric, whereas the separator size 
of ^ivs(G) approximates the cutsize of node partition II^('H) induced by Hvs(G) 
according to the connectivity metric 



Comparison of Figures 3.2(a) and 3.2(b)| illustrates that the connectivities of sep- 



arator vertices in Hvs are exactly equal to those of the cut nets of induced partial 
node partition U' U (H). Figure 3.2(b) shows a 3-way complete node partition Hu(H) 



obtained by assigning the free nodes (shown with dashed lines) U17 and Mis to parts 
U3 and U\ , respectively. This free node assignment does not increase the connec- 
tivities of the cut nets. However a different free node assignment might increase the 
connectivities of the cut nets. For example, assigning free node 1*17 to part U2 instead 
of U3 will increase the connectivity of net by 1. 

3.2.1. Recursive-bipartitioning-based partitioning. In the recursive bipar- 
titioning (RB) paradigm, a hypergraph is first partitioned into 2 parts. Then, each 
part of the bipartition is further bipartitioned recursively until the desired number of 
parts, K is achieved. The following corollary forms the basis for the use of RB-based 
GPVS for RB-based HP according to the connectivity and the cut-net metrics. 

Corollary 3. Let TlysiG) — {Vi, V2; V5} be a partition of Q by a vertex sep- 
arator Vs , and let n^('H) = {^1,^2} be a node partition of % that induces the net 
partition ILv('H) = {Afi = Vi,Af 2 = V2',Afs = Vs} ■ The connectivity of a net ni in 
Hu(H) is equal to the connectivity of the corresponding vertex Vi in TLvs(G)- 

Separator- vertex removal: In RB-based multiway HP, the cut-net metric is formu- 
lated by cut-net removal after each RB step. In this method, after each hypergraph 
bipartitioning step, each cut net is discarded from further RB steps. That is, a node 
bipartition TluifH) — {U\,U2\ of the current hypergraph %, which induces the net 
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bipartition Hj^(H) = {A/i, A/a; Afs} , is decoded as generating two sub-hypergraphs 
Hi = (Ui,Afi) and H2 = (WgjA/a) for further RB steps. Hence, the total cutsize of 
the resulting multiway partition of H according to the cut-net metric will be equal 
to the sum of the number of cut nets of the bipartition obtained at each RB step. 

The cut-net metric can be formulated in the RB-GPVS-based multiway HP by 
separator-vertex removal so that each separator vertex is discarded from further RB 
steps. That is, at each RB step, a 2-way vertex separator Hvs(G) — {Vl,V2; Vs} of 
Q is decoded as generating two sub-graphs Gi — (Vl,£i) and G2 — (^,£2), where 
£\ and £2 denote the internal edges of vertex parts Vi and V2 , respectively. In other 
words, Qi and G2 ar e the sub-graphs of G induced by the vertex parts Vi and V2 , 
respectively. Gi and G2 constructed in this way become the NIG representations of 
hypergraphs Hi and H2 , respectively. Hence, the sum of the number of separator 
vertices of the 2-way GPVS obtained at each RB step will be equal to the total cutsize 
of the resulting multiway partition of H according to the cut-net metric. 



Separator-vertex splitting: In RB-based multiway HP, the connectivity metric 
is formulated by adopting the cut-net splitting method after each RB step. In 
this method, each RB step, Hu(H) = {Ui,U2} is decoded as generating two sub- 
hypergraphs Hi = (Ui,Afi) and H2 = (^A^) as in the cut-net removal method. 
Then, each cut net n s of Uu{H) is split into two pin- wise disjoint nets n\ and n 2 
with Pins{n s ) = Pins(n s ) P1U1 and Pins(n 2 s ) = Pins(n s ) DU2, where n s and n 2 
are added to the net lists of Hi and H2 , respectively. In this way, the total cutsize 
of the resulting multiway partition according to the connectivity metric will be equal 
to the sum of the number of cut nets of the bipartition obtained at each RB step [9] . 

The connectivity metric can be formulated in the RB-GPVS-based multiway HP 
by separator-vertex splitting, which is not as easy as the separator-vertex removal 
method and it needs special attention. In a straightforward implementation of this 
method, a 2-way vertex separator U V s(G) = {Vi,V 2 ;Vs} is decoded as generating 
two subgraphs Gi and G2 which are the sub-graphs of G induced by the vertex sets 
Vi U Vs and V2 U Vs , respectively. That is, each separator vertex v s £ Vs is split 
into two vertices and v 2 with Adj(v}.) = Adj(v s ) n (Vi U Vs) and Adj(v 2 ) = 
Adj(v s ) n (V2 U Vs) . Then, the split vertices vl and v 2 are added to the subgraphs 
(Vi,£i) and (V2,£ 2 ) to form Qi and G2 , respectively. 

This straightforward implementation of separator-vertex splitting method can 
be overcautious because of the unnecessary replication of separator edges in both 
subgraphs Gi and C/ 2 - Here an edge is said to be a separator edge if two vertices 
connected by the edge are both in the separator Vs . Consider a separator edge 
(v Sll v S2 ) € £ in a given bipartition Uvs(G) = {Vi,V 2 ;Vs} of Q, where U U (H) = 
{^1,^2} is a bipartition of H induced by Hvs(G) according to construction given 
in (3.5). If both Ui and U2 contain at least one node that induces the separator 
edge (v Sl ,v S2 ) of Q then the replication of (v Sl ,v S2 ) in both subgraphs Qi and G2 
is necessary. If, however, all hypergraph nodes that induce the edge (v Sl ,v S2 ) of G 
remain in only one part of Hu(H) then the replication of (v Sl ,v S2 ) on the graph 
corresponding to the other part is unnecessary. For example, if all nodes connected 
by both nets n Sl and n S2 of H remain in Ui of Iiu{H) then the edge (v Sl ,v S2 ) 
should be replicated in only Qi . Qi and G2 constructed in this way become the 
NIG representations of hypergraphs Hi and H2, respectively. Hence, the sum of the 
number of separator vertices of the 2-way GPVS obtained at each RB step will be 
equal to the total cutsize of the resulting multiway partition of H according to the 
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Fig. 3.3. Separator-vertex splitting. 



connectivity metric. 



Figure |3.3| illustrates three separator vertices v Sl 

,1 



tex separator and their splits into vertices v\ 



v s 2 , v s 3 and v 2 Si 



v S2 and v S3 in a 2-way ver- 



separator vertices v s 



The three 

and v S3 are connected with each other by three separator 



2 2 
V V 

' U S 2 ' "S3 ' 



edges (u Sl ,u S2 ), (v Sl ,v S3 ) and (v S2 ,v S3 ) in order to show three distinct cases of sep- 
arator edge replication in the accurate implementation. The figure also shows four 
hypergraph nodes u x ,u yi u z and ut which induce the three separator edges, where 
u x , u z are assigned to part U\ and u y , u t are assigned to part U2 ■ Since only u x 
induces the separator edge (v Sl , v S2 ) and u x is assigned to U\ , it is sufficient to repli- 
cate the separator edge (v Sl ,v S2 ) in only V\ . Symmetrically, since only u y induces 
the separator edge (v Sl ,v S3 ) and u y is assigned to U2, it is sufficient to replicate 
the separator edge (v sl ,v S3 ) in only V2 . However, since u z and u t both induce the 
separator edge (v S2 ,v S3 ) and u z and u t are respectively assigned to U\ and U2, it 
necessary to replicate the separator edge (v S2 , v S3 ) in both Vi and V2 . 

This accurate implementation of the separator-vertex splitting method depends 
on the availability of both T-L and its NIG representation Q at the beginning of each 
RB step. Hence, after each RB step, the sub-hypergraphs Hi and Hi should be con- 
structed as well as the subgraphs Gi and G2 ■ We briefly summarize the details of the 
proposed implementation method performed at each RB step. A 2-way GPVS is per- 
formed on Q to obtain a vertex separator Tlys{G)- Then, a node bipartition Hu(H) 



of % is constructed according to (3.5 1 by decoding the vertex separator Hvs(G) of G ■ 



Then, the 2-way vertex separator Tlvs(G) is used together with the node bipartition 
Hu(H) to generate subgraphs Gi and G2 as described above. The sub-hypergraphs 
"Hi and ^2 are also constructed for the use in subsequent RB steps. An alternative 
implementation could be first generating sub-hypergraphs Hi and H2 from Hn(H) 
and then constructing subgraphs Gi and G2 from Hi and H2, respectively, using 
NIG construction. However, this alternative implementation method is significantly 
inefficient compared to the proposed implementation, since construction of the NIG 
representation from a given hypergraph is computationally expensive. 



3.2.2. Balancing constraint. Consider a node partition liu (H) = {Ui,U2, ■ ■ ■ Mk] 
of H constructed from the vertex separator Hvs(G) = {Vi, Vz, • • • , Vic} of NIG G 
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according to (3.5). Since the vertices of Q correspond to the nets of the given hy- 
pergraph H, it is easy to enforce a balance criterion on the nets of H by setting 
w(vi) = w(rii) . For example, assuming unit net weights, the partitioning constraint 
of balancing on the vertex counts of parts of Hvs(G) infers balance among the internal 
net counts of node parts of liu (H) . 

However, balance on the nodes of H can not be directly enforced during the 
GPVS of Q , because the NIG model suffers from information loss on hypergraph 
nodes. Here, we propose a vertex-weighting model for estimating the cumulative 
weight of hypergraph nodes in each vertex part Vk of the vertex separator Tlvs(G)- 
In this model, the objective is to find appropriate weights for the vertices of Q so that 



vertex-part weight W(Vk) computed according to (2.1) approximates the node-part 



weight W(U}~) computed according to (2.6) 



The NIG model can also be viewed as a clique-node model since each node Uh 
of the hypergraph induces an edge between each pair of vertices corresponding to the 
nets that connect Uh . So, the edges of Q implicitly represent the nodes of H . Each 
hypergraph node Uh of degree dh induces ( d 2 fe ) clique edges among which the weight 
w(uh) is distributed evenly. That is, every clique edge induced by node Uh can be 
considered as having a uniform weight of w(uh)/ ( d 2 '*) ■ Multiple edges between the 
same pair of vertices is collapsed into a single edge whose weight is equal to the sum 
of the weights of its constituent edges. Hence, the weight w(eij) of each edge of 
Q becomes, 

Then, the weight of each edge is uniformly distributed between the pair of vertices 
connected by that edge. That is, edge e^- contributes u>(ey)/2 to both Vi and Vj . 
Hence, in the proposed model, the weight w(vi) of vertex Vi becomes, 



w(u h ) 



2 

Vj £Adj(vi ) 



E 



(3-7) 

dh 

Uh^Pins(ni) 

Consider an internal hypergraph node Uh of part Uk of Iiu{H) . Since all graph 
vertices corresponding to the nets that connect are in part Vk of Hvs(G) , will 
contribute w(uh) to W(Vk)- Consider a boundary hypergraph node Uh of part Uk 
with an external degree Sh < dh, i.e., is connected by Sh cut nets. Thus, will 
contribute by an amount of (1 — 5h/dh)w(uh) to W(Vk) instead of w(uh) . So, vertex- 
part weight W(Vk) of Vk in Hvs(G) will be less than the actual node-part weight 
W(Uk) of Uk in Hu(W) . As the vertex-part weights of different parts of Hvs(G) will 
involve similar errors, the proposed method can be expected to produce a sufficiently 
good balance on the node-part weights of Hu(H) . 

The free nodes can easily be exploited to improve the balance during the com- 



pletion of partial node partition. For the cut-net metric in (2.8), we perform free- 
node-to-part assignment after obtaining K-w&y GPVS, since arbitrary assignments 
of free nodes do not disturb the cutsize by Corollary 2. However, for the connectivity 



metric in (2.9), free-node-to-part assignment needs special attention if it is performed 



after obtaining a if -way GPVS. According to Theorem [2j arbitrary assignments of 
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Fig. 3.4. (a) A sample matrix A whose row-net hypergraph representation Ha is equal to the 
sample hypergraph H given in Figure 3.1(a) and (b ) the matrix Z = AA T . 



free nodes may increase the connectivity of cut nets. So, for the connectivity cutsize 
metric, we perform free-node-to-part assignment after each RB step to improve the 
balance. Note that free-node-to-part assignment performed in this way does not in- 
crease the connectivity of cut nets in the RB-GPVS-based by Corollary [3j For both 
cutsize metrics, the best-fit-decreasing heuristic 50 used in solving the bin-packing 
problem is adapted to obtain a complete node partition/bipartition. Free nodes are 
assigned to parts in decreasing weight, where the best-fit criterion corresponds to 
assigning a free node to a part that currently has the minimum weight. Initial part 
weights are taken as the weights of the two parts in partial node bipartition. 

3.3. Matrix Theoretical View of the Relation Between HP and GPVS. 

We will first briefly discuss the row-net and column-net models we proposed for rep- 
resenting rectangular as well as symmetric and nonsymmetric square matrices in our 
earlier work [5J S3 01] . These two models are duals: the row- net representation 
of a matrix is equal to the column-net representation of its transpose. Here, we only 
discuss the row-net model for permuting a matrix A into a primal singly-bordered 
block-diagonal (SB) form, whereas the column-net model can be used for permuting 
A into a dual SB form. In the row-net hypergraph model, an M x N matrix A = (aij) 
is represented as a hypergraph %a = (U,N) on N nodes and M nets with the num- 
ber of pins equal to the number of nonzeros in matrix A . Node and net sets IA and 
J\f correspond, respectively, to the columns and rows of A. There exist one net 
and one node Uj for each row i and column j , respectively. Net n» C U contains the 
nodes corresponding to the columns that have a nonzero entry in row i, i.e., Uj £ 
if and only if 7^ 0. That is, Pins{rii) represents the set of columns that have a 
nonzero in row % of A, and in a dual manner Nets(uj) represents the set of rows that 
have a nonzero in column j of A. Figure 3.4(a) shows an 15 x 18 matrix A whose 
row-net hypergraph representation T-La is equal to the sample hypergraph % given 



in Figure 3.1(a) 



Let Gnig(Ha) — (V, 8) denote the NIG model for the row-net hypergraph repre- 
sentation %a = (U,N) of matrix A. By definition of the NIG model, the vertices of 
Qnig w ih represent the rows of A , and € £ if and only if Pins{rii ) n Pins(rij ) ^ . 
Since Pms(rij) represents the set of columns that have a nonzero in row i of A, 
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Fig. 3.5. A 3-way DB from of the AA T matrix (a) SB form Agg of A shown in Figure f3.4(a)\ 



Pins(n.i) n Pins(rij) ^ corresponds to the condition that rows i and j of A have 
a nonzero in at least one common column. Let Z=(zij) denote the M x M matrix 
Z = AA T , an (.) denote inner-product operator. Since Zij — (ri,rj), z^ will be 
nonzero if and only if € £ . Hence, the sparsity pattern of symmetric matrix Z will 
correspond to the adjacency matrix representation of Gnig- I n other words, Gnig 
will be equivalent to the standard graph representation of symmetric matrix Z, i.e., 
Gnig{7~La) = Gaa t ■ Note that although vertex vt of Gnig represents only row i of 
A , it represents both row i and column i of AA T in Gaa t ■ 

Figure 3.4(b) shows the 15 x 15 matrix Z — AA T . Note that the standard graph 



representation of Z is equivalent to the NIG representation Gnig(J~La) of Ha- As 
has long been used for nested dissection ordering for sparsity preserving factorizations, 
the problem of transforming a symmetric matrix into a DB form through symmetric 
row/column permutation can be modeled as a GPVS problem on its standard graph 
representation. So, Figure 3.5(a) shows a 3-way DB form of the AA T matrix induced 
by the 3-way GPVS Tl vs {G) of Gnig(H a ) shown in Figure |3~4(b)| Recall that the 3- 



way partition Hu(Ha) shown in Figure 3.1(b) is induced by H V s(G) ■ Hence, U VS (G) 



induces the same SB form A$b of A as shown in Figure 3.5(b) 



3.4. Multilevel implementation of GPVS-based HP formulation. The 

state-of-the-art graph and hypergraph partitioning tools that adopt the multilevel 
framework, consist of three phases: coarsening, initial partitioning, and uncoarsen- 
ing. In the first phase, a multilevel clustering is applied starting from the original 
graph/hypergraph by adopting various matching heuristics until the number of ver- 
tices in the coarsened graph/hypergraph reduces below a predetermined threshold 
value. Clustering corresponds to coalescing highly interacting vertices to supernodes. 
In the second phase, a partition is obtained on the coarsest graph/hypergraph using 
various heuristics including FM, which is an iterative refinement heuristic proposed 
for graph/hypergraph partitioning by Fiduccia and Mattheyses [24] as a faster im- 
plementation of the KL algorithm proposed by Kernighan and Lin [36 . In the third 
phase, the partition found in the second phase is successively projected back towards 
the original graph/hypergraph by refining the projected partitions on the intermediate 
level uncoarserned graphs/hypergraphs using various heuristics including FM. 
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One of the most important applications of GPVS is George's nested-dissection 
algorithm 25, which has been widely used for reordering of the rows/columns of 
a symmetric, sparse, and positive definite matrix to reduce Gil in the factor matrices. 
Here, GPVS is defined on the standard graph model of the given symmetric matrix. 
The basic idea in the nested dissection algorithm is to reorder symmetric matrix into a 
2- way DB form so that no fill can occur in the off-diagonal blocks. The DB form of the 
given matrix is obtained through a symmetric row/column permutation induced by 
a 2- way GPVS. Then, both diagonal blocks are reordered by applying the dissection 
strategy recursively. The performance of the nested-dissection reordering algorithm 
depends on finding small vertex separators at each dissection step. 

In this work, we adapted and modified the onmetis ordering code of MeTiS 31 
for implementing our GPVS-based HP formulation, onmetis utilizes RB paradigm for 
obtaining multiway GPVS. Since K is not known in advance for ordering applica- 
tions, recursive bipartitioning operations continue until the weight of a part becomes 
sufficiently small. In our implementation, we terminate the recursive bipartitioning 
process whenever the number of parts become K. 

The separator refinement scheme used in the uncoarsening phase of onmetis con- 
siders vertex moves from vertex separator Tlvs(G) to both Vi and V2 in Hys — 
{Vi, V2; V5}. During these moves, onmetis uses the following feasibility constraint, 
which incorporates the size of the separator in balancing, i.e., 

mW W ,WW>) < (1 + .) " ,W) + 'T ) + ' y(Vs) - W~ (3-8) 



However, this may become a loose balancing constraint compared to ( 2.2 ) for relatively 
large separator sizes which is typical during refinements of coarser graphs. This loose 
balancing constraint is not an important concern in onmetis, because it is targeted 
for fill-reducing sparse matrix ordering which is not very sensitive to the imbalance 
between part sizes. Nevertheless, this scheme degrades the load balancing quality 
of our GPVS-based HP implementation, where load balancing is more important in 
the applications for which HP is utilized. We modified onmetis by computing the 
maximum part weight constraint as 

W max = (l + e)(W(Vi) + W(V 2 ))/2 (3.9) 



at the beginning of each FM pass, whereas onmetis computes W max according to ([3^ 
once for all FM passes, in a level. Furthermore, onmetis maintains only one value for 
each vertex which denotes both the weight and the cost of the vertex. We added a 
second field for each vertex to hold the weight and the cost of the vertex separately. 
The weights and the costs of vertices are accumulated independently during vertex 
coalescings performed by matchings at the coarsening phases. Recall that weight 
values are used for maintaining the load balancing criteria, whereas cost values are 
used for computing the size of the separator. That is, FM gains of the separator 
vertices are computed using the cost values of those vertices. 

The GPVS-based HP implementation obtained by adapting onmetis as described 
in this subsection will be referred to as onmetis H P . 

4. Experimental Results. We test the performance of our GPVS-based HP 
formulation by partitioning matrices from the linear-programming (LP) and the posi- 
tive definite (PD) matrix collections of the University of Florida matrix collection [TS] . 
Matrices in the latter collection are square and symmetric, whereas the matrices in 
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the former collection are rectangular. The row-net hypergraph models [9l [13] of the 
test matrices constitute our test set. In these hypergraphs, nets are associated with 
unit cost. To show the validity of our GPVS-based HP formulation, test hypergraphs 
are partitioned by both PaToH and onmetisHP and default parameters are utilized 
in both tools. In general, the maximum imbalance ratio e was set to be 10% . 

We excluded small matrices that have less than 1000 rows or 1000 columns. In 
the LP matrix collection, there were 190 large matrices out of 342 matrices. Out of 
these 190 large matrices, 5 duplicates, 1 extremely large matrix and 5 matrices, for 
which NIG representations are extremely large were excluded. We also excluded 26 
outlier matrices which yield large separator^] to avoid skewing the results. Thus, 153 
test hypergraphs are used from the LP matrix collection. In the PD matrix collec- 
tion, there were 170 such large matrices out of 223 matrices. Out of these 170 large 
matrices, 2 duplicates, 2 matrices, for which NIG representations are extremely large 
and 7 matrices with large separators were excluded. Thus, 159 test hypergraphs are 
used from the PD matrix collection. We experimented with if -way partitioning of test 
hypergraphs for K — 2, 4, 8, 16, 32, 64, and 128. For a specific K value, if -way parti- 
tioning of a test hypergraph constitutes a partitioning instance. For the LP collection, 
instances in which min{ \U\, |A/"|} < 50K are discarded as the parts would become 
too small. So, 153, 153, 153, 153, 135, 100, and 65 hypergraphs are partitioned 
for K = 2,4,8,16,32,64, and 128, respectively, for the LP collection. Similarly for 
the PD collection, instances in which \U\ < 50K are discarded. So, 159, 159, 159, 
159, 145, 131, and 109 hypergraphs are partitioned for K = 2,4,8,16,32,64, and 
128, respectively for the PD collection. In this section, we summarize our findings 
in these experiments. Please refer to [35] for detailed experimental results for each 
partitioning instance. 

The hypergraphs obtained from the LP matrix collection are used for permuting 
the matrices into singly-bordered (SB) block-angular-form for coarse-grain paralleliza- 
tion of linear-programming applications pjj. Here, minimizing the cutsize according 
to the cut-net metric (2.4) corresponds to minimizing the size of the row border in the 
induced SB form. In these applications, nets are either associated with unit weights 
or weights that are equal to the nonzeros in the respective rows. In the former case, 
net balancing corresponds to balancing the row counts of the diagonal blocks, whereas 
in the latter case, net balancing corresponds to balancing the nonzero counts of the 
diagonal blocks. Experimental comparisons are provided only for the former case, 
because PaToH does not support different cost and weight associations to nets. 

The hypergraphs obtained from the PD matrix collection are used for minimizing 
communication overhead in column-parallel matrix-vector multiply algorithm in iter- 
ative solvers. Here, minimizing the cutsize according to the connectivity metric (2.5| 
corresponds to minimizing the total communication volume when the point-to-point 
inter-processor communication scheme is used [5]. Minimizing the cutsize according 
to the cut-net metric ( 2.4 ) corresponds to minimizing the total communication volume 
when the collective communication scheme is used |13j . In these applications, nodes 
associated with weights that are equal to the nonzeros in the respective columns. So, 
balancing part weights corresponds to computational load balancing. 

In the following tables, the performance figures are computed and displayed as 
follows. Since both PaToH and onmetisHP tools involve randomized heuristics, 
10 different partitions are obtained for each partitioning instance and the geometric 
average of the 10 resultant partitions are computed as the representative results for 



Here, a separator is said to be large if it includes more than 33% of all nets. 
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both HP tools on the particular partitioning instance. For each partitioning instance, 
the cutsize value is normalized with respect to the total number of nets in the re- 
spective hypergraph. Recall that all test hypergraphs have unit-cost nets. So, for the 
cut-net metric, these normalized cutsize values show the fraction of the cut nets. For 
the connectivity metric, these normalized cutsize values show the average net connec- 
tivity. For each partitioning instance, the running time of PaToH is normalized with 
respect to that of onmetisHP , thus showing the speedup obtained by onmetisH P 
for that partitioning instance. These normalized cutsize values and speedup values 
as well as percent load imbalance values are summarized in the tables by taking the 
geometric averages for each K value. 

Table 4.1 

Performance averages on the LP matrix collection for cut-net metric with net balancing. 





PaToH 


onmetisHP 




K 


cutsize 


%LI 


cutsize 


%LI 


speedup 


2 


0.02 


1.2% 


0.03 


0.3% 


2.04 


4 


0.02 


1.9% 


0.05 


2.6% 


2.45 


8 


0.07 


3.1% 


0.09 


6.9% 


2.64 


16 


0.09 


5.2% 


0.14 


13.0% 


2.78 


32 


0.13 


8.8% 


0.18 


23.1% 


2.83 


64 


0.15 


11.5% 


0.21 


27.8% 


2.83 


128 


0.16 


13.5% 


0.21 


31.3% 


2.76 



Table 4.1 displays overall performance averages of onmetisHP compared to those 



of PaToH for the cut-net metric in (see (2.8)) with net balancing on the LP matrix 



collection. As seen in Table |4.1[ onmetisHP obtains hypergraph partitions of com- 
parable cutsize quality with those of PaToH . However, load balancing quality of 
partitions produced by onmetisHP is worse than those of PaToH , especially with 
increasing K . As seen in the table, onmetisHP runs significantly faster than PaToH 
for each K . For example, onmetisHP runs 2.83 times faster than PaToH for 32- 
way partitionings on the average. 

Table 4.2 

Performance averages on the PD matrix collection for cut-net metric with node balancing. 





PaToH 




onmeti 


iHP 






K 


cutsize 


%LI 


cutsize 


exp%LI p act%LI p 


act%LI c 


speedup 


2 


0.01 


0.1% 


0.01 


0.2% 


0.2% 


0.1% 


1.40 


4 


0.03 


0.3% 


0.03 


0.9% 


1.5% 


1.1% 


1.75 


8 


0.05 


0.4% 


0.05 


2.8% 


3.7% 


2.7% 


1.96 


16 


0.08 


0.6% 


0.08 


6.7% 


7.4% 


5.4% 


1.98 


32 


0.12 


0.9% 


0.12 


13.4% 


12.8% 


9.2% 


2.17 


64 


0.17 


1.2% 


0.16 


22.1% 


19.8% 


13.5% 


2.27 


128 


0.25 


1.6% 


0.24 


32.5% 


28.8% 


17.9% 


2.25 



Table |4T2] displays overall performance averages of onmetisHP compared to those 
of PaToH for the cut-net metric with node balancing on the PD matrix collection. In 
the table, exp%LI p and act%LI p respectively denote the expected and actual percent 
load imbalance values for the partial node partitions of the hypergraphs induced by 
K-w&y GPVS. act%LI c denotes the actual load imbalance values for the complete 
node partitions obtained after free-node-to-part assignment. The small discrepancies 
between the exp%LI p and act%LI p values show the validity of the approximate 
weighting scheme proposed in Section |3.2| for the vertices of the NIG. As seen in 
the table, for each K , the act%LI c value is considerably smaller than the act%LI p 
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value. This experimental finding confirms the effectiveness of the free-node-to-part 
assignment scheme mentioned in Section |3.2| As seen in Table |4.2[ onmetisH P 
obtains hypergraph partitions of comparable cutsize quality with those of PaToH . 
However, load balancing quality of partitions produced by onmetisH P is considerably 
worse than those of PaToH . As seen in the table, onmetisH P runs considerably 
faster than PaToH for each K . 

Table 4.3 

Comparison of accurate and overcautious separator-vertex splitting implementations with aver- 
ages on the PD matrix collection for connectivity metric with node balancing. 





overcautous / accurate 


K 


cutsize 


%LI 


speedup 


2 


1.00 


0.63 


1.29 


4 


1.02 


0.79 


1.50 


8 


1.10 


0.79 


1.61 


16 


1.29 


0.70 


1.63 


32 


1.56 


0.64 


1.61 


64 


1.84 


0.69 


1.60 


128 


2.09 


0.60 


1.54 



Table |4.3| is constructed based on the PD matrix collection to show the validity 
of the accurate vertex splitting formulation proposed in Section 3.2.1 for the connec- 



tivity cutsize metric (see ( 2.9)). In this table, speedup, cutsize and load imbalance 
values of onmetisHP that uses the straightforward (overcautious) separator-vertex 
splitting implementation are normalized with respect to those of onmetisHP that 
uses the accurate implementation. In the straightforward implementation, free-node- 
to-part assignment is performed after obtaining a if -way GPVS, since hypergraphs 
are not carried through the RB process. Free nodes are assigned to parts in de- 
creasing weight, where the best- fit criterion corresponds to assigning a free node to a 
part which increases connectivity cutsize by the smallest amount with ties are broken 
in favor of the part with minimum weight. As seen in the table, the overcautious 
implementation leads to slightly better load balance than accurate implementation, 
because overcautious implementation performs free-node-to-part assignment on the 
if -way partial node partition induced by the if -way GPVS. As also seen in the ta- 
ble, the overcautious implementation, as expected, leads to slightly better speedup 
than the accurate implementation. However, the accurate implementation leads to 
significantly less cutsize values. 

Table 4.4 

Performance averages on the PD matrix collection for connectivity metric with node balancing. 





PaToH 


onmeti 


iHP 




K 


cutsize 


%LI 


cutsize 


%LI 


speedup 


2 


1.03 


0.1% 


1.03 


0.2% 


1.29 


4 


1.08 


0.3% 


1.08 


0.8% 


1.50 


8 


1.15 


0.5% 


1.15 


1.7% 


1.61 


16 


1.26 


0.7% 


1.25 


4.1% 


1.63 


32 


1.37 


1.0% 


1.36 


7.9% 


1.61 


64 


1.49 


1.5% 


1.47 


11.8% 


1.60 


128 


1.63 


1.9% 


1.60 


16.5% 


1.54 



Table [474] displays overall performance averages of onmetisHP compared to those 
of PaToH for the connectivity metric with node balancing on the PD matrix col- 
lection. In contrast to Table |4.2| load imbalance values are not displayed for partial 
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node partitions in Table |4.4[ because free-node-to-part assignments are performed 
after each 2-way GPVS operation for the sake of accurate implementation of the 



separator-vertex splitting method as mentioned in Section 3.2 So, %LI values dis 



played in Table 4.4 show the actual percent imbalance values for the if -way node 



partitions obtained. As seen in Table 4.4 similar to results of Table 4.2 onmetisH P 
obtains hypergraph partitions of comparable cutsize quality with those of PaToH , 
whereas load balancing quality of partitions produced by onmetisH P is considerably 
worse than those of PaToH . As seen in Table 4, onmetisH P still runs considerably 
faster than PaToH for each K for the connectivity metric. However, the speedup 



values in Table 4.4 , are considerable smaller compared to those displayed in Table 2, 
which is due to the fact that onmetisH P carries hypergraphs during the RB process 
for the sake of accurate implementation of the separator-vertex splitting method as 
mentioned in Section [3.21 



A common property of Tables 4.1 4.2 and 4.4 is the increasing speedup of 



onmetisH P compared to PaToH with increasing K values. This experimental find- 
ing stems from the fact that the initial NIG construction overhead amortizes with in- 
creasing K . Another common property of Tables |4.1||4.2[ and |4.4| is that onmetisH P 
runs significantly faster than PaToH , while producing partitions of comparable cut- 
size quality with, however, worse load balancing quality. These experimental findings 
justify our GPVS-based hypergraph partitioning formulation for effective paralleliza- 
tion of applications in which computational balance definition is not very precise and 
preprocessing overhead due to partitioning overhead is important. 

5. Conclusions. We have presented how the hypergraph partitioning problem 
can be efficiently and effectively solved through finding vertex separators on the net 
intersection graph representation of a hypergraph. Our empirical study on a wide set 
of test matrices showed that runtimes can be as much as 4.17 times faster, with the 
cutsize quality is preserved on average (and improved in many cases), while balance 
was achieved for small number of parts and remained acceptable for large number 
of parts. Moreover, we proposed techniques that can trade off cutsize and runtime 
against balance, showing that balance can be achieved even for high number of parts. 
Overall results prove that, the proposed hypergraph partitioning through vertex sep- 
arators on graphs is ideal for applications where balance is not well-defined, which is 
the main motivation for our work, and competitive for application where balance is 
important. 

We believe that the success of the proposed methods point to several future re- 
search directions. First, better vertex weighting schemes to approximate the node 
balance is an area that can make a significant impact. We believe exploiting do- 
main specific information or devising techniques that can apply to certain classes 
of graphs, as opposed to constructing generic approximations that can work for all 
graphs, is a promising avenue to explore. Secondly, the algorithms we have used in 
this paper, were only slightly adjusted for the particular problem we were solving. 
There is a lot of room for improvements in algorithms for finding vertex separators 
with balanced hypergraph partitions, and we believe these algorithms can be designed 
and implemented on the existing partitioning graph partitioning frameworks, which 
means strong algorithmic ideas can be translated into effective software tools with rel- 
atively little effort. Finally, this paper is only an example of the growing importance 
of graph partitioning and the need for more flexible models for graph partitioning. 
Graph partitioning now is an internal step for divide-and-conquer based methods, 
whose popularity will only increase with the growing problems sizes. As such, re- 
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quirements for graph partitioning will keep growing and broadening. While, the state 
of the art for graph partitioning has drastically improved from the days of merely 
minimizing the number of cut edges, while keeping the number of vertices balanced 
between the two parts, we believe there is still a lot of room for growth for more 
general models for graph partitioning. 
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